The purpose of this homework is to familiarize yourself with bipartite graphs and two-mode networks, projection, and weighted matrices in R.
For both the PaulRevereNet network and the
PhilKidnapNet network, do the following:
gplot() function.N and M) is most
central in the network.Note: for 11, you can access the vertex attribute for the
PaulRevereNet network using the
PaulRevereNet %v% "people.names" and
PaulRevereNet %v% "place.names", respectively. For the
PhilKidnapNet network, the names of the nodes are stored as
a vertex attribute called vertex.names.
For both the PaulRevereNet network and the
PhilKidnapNet network, do the following:
matrix from the network. Use
the as.sociomatrix() function in the network
package to do so.network from the dichotomized
“person” matrix. Use this object for steps 6-11.PaulRevereNet NetworkFirst, let’s read in the PaulRevereNet network. This is
stored as an .rds document in the data folder on the SNA
Textbook site. We will use the readRDS() function, with
the file
path, to load the file. Since we are calling a url, we need to use
the url() function as well.
Finally, we need to make sure the sna and
network packages are loaded, using library(),
so that R recognizes the PaulRevereNet object as one of
class network.
# clear the workspace
rm( list = ls() )
# load the libraries we need
library( sna )
library( network )
# define the path location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-paul-revere-net.rds"
PaulRevereNet <- readRDS( url( loc ) )
# look at the network
PaulRevereNet## Network attributes:
## vertices = 261
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = 254
## total edges= 334
## missing edges= 0
## non-missing edges= 334
##
## Vertex attribute names:
## names people.names place.names vertex.names
##
## No edge attributes
gplot() function.# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PaulRevereNet, # our network to plot
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
vertex.cex=2, # size the nodes
main="Paul Revere Network" # add a title
)# create a matrix from the network object
PaulRevereMat <- as.matrix( PaulRevereNet )
# identify the number of edges in the graph
L <- sum( PaulRevereMat )
# identify the number of actors in the example
N <- dim( PaulRevereMat )[1]
# identify the number of events in the example
M <- dim( PaulRevereMat )[2]
# calculate the density
density.PaulRevere <- L / ( N * M )
# check it out
density.PaulRevere## [1] 0.1878515
The density of for the Paul Revere network indicates that 19 percent of the ties that could be observed, were observed. Overall, we can say that the network is fairly sparse.
# raw scores for actors
actor.deg <- rowSums( PaulRevereMat )
# raw scores for events
event.deg <- colSums( PaulRevereMat )# an alternative is to just use the mean() function with the degree data
mean( actor.deg )
mean( event.deg )For the first set of nodes, the individuals who could attend events at locations, the mean degree is 1.31, meaning that on average, individuals attended 1.31 events.
For the second set of nodes, the locations where individuals could meet, the mean degree is 47.71, meaning that each location, on average, had 47.71 attendees.
# standardized score for actors
s.actor.deg <- actor.deg / M
# standardized score for events
s.event.deg <- event.deg / N## [1] 0.1878515
## [1] 0.1878515
Notice that these are the same because we adjusted for the different sizes of M and N in the network.
# define the standardized scores for actors
actor.size <- actor.deg / M
# define the standardized scores for events
event.size <- event.deg / N
# combine these to use in the plot
v.size <- c( actor.size, event.size )
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PaulRevereNet, # our network to plot
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
main="Paul Revere Network", # add a title
vertex.cex = v.size + 0.5 # set the size (add 0.5 so it is not too small)
)We use the tnet package has the
reinforcement_tm() function to calculate the dyadic
clustering score. Recall that this package conflicts with
sna and network. So, we will need to make sure
we load those packages again below.
# load tnet
library( tnet )
# coerce the network object to work with tnet
paul.revere.tnet <- as.tnet( as.matrix( PaulRevereNet ) )
# get the reinforcement score
reinforcement_tm( paul.revere.tnet )## [1] 0.1709114
The value of 0.17 indicates that 17 percent of the 3-paths are closed (i.e. 4-cycles). In other words, there is not a lot of clustering in terms of involvement at the same locations.
N and M) is
most central in the network.# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PaulRevereNet,
gmode = "twomode",
usearrows = FALSE,
main="Paul Revere Network",
vertex.cex = v.size + 0.5,
label = PaulRevereNet %v% "people.names", # add the individual names
label.cex = 0.4, # change the label size
label.pos = 5 # set the label position
)This is hard to see! One way around this is to just show labels for the those nodes that have degree centrality scores at or above the mean. Let’s see how we can do this.
First, identify who is the most central person. We can do this using
the sort() function and the tail() function.
(Of course, you could reverse the sorting and use the
head() function).
## Revere.Paul
## 0.7142857
We see it is Paul Revere!
Now, we want to change the names to only show Paul Revere’s name.
# create a vector of names to change
actor.adj.names <- PaulRevereNet %v% "people.names"
# if the name is not Paul Revere, make it NA
actor.adj.names[ actor.adj.names != "Paul Revere" ] <- NANow that we have adjusted the names, we can plot it.
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PaulRevereNet,
gmode = "twomode",
usearrows = FALSE,
main="Paul Revere Network",
vertex.cex = v.size + 0.5,
label = actor.adj.names, # here we use the new names
label.cex = 0.8, # change the label size
label.pos = 5
)# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PaulRevereNet,
gmode = "twomode",
usearrows = FALSE,
main="Paul Revere Network",
vertex.cex = v.size + 0.5,
label = PaulRevereNet %v% "place.names", # use the place names
label.cex = 1.2,
label.pos = 5
)Again, this is a bit tricky to see from the plot. But, we
can examine the degree centrality scores as we did above with the
sort() function and the tail() function.
## TeaParty
## 0.3267717
We see it is the “TeaParty” location. We can see this a bit better in
our plot by adjusting the color of the edges and
expanding the size of the labels using the rescale()
function.
# create the function
rescale <- function( nchar, low, high ){
min_d <- min( nchar )
max_d <- max( nchar )
rscl <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
rscl
}
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PaulRevereNet,
gmode = "twomode",
usearrows = FALSE,
main="Paul Revere Network",
vertex.cex = v.size + 0.5,
label = PaulRevereNet %v% "place.names",
label.cex = rescale( v.size, 0.3, 1.8 ), # pass the rescale function to the label.cex argument
label.pos = 5,
edge.col = "grey80" # change the color of the edges
)PhilKidnapNet NetworkNext, let’s read in the PhilKidnapNet network. This is
stored as an .rds document in the data folder on the SNA
Textbook site. We will use the readRDS() function, with
the file
path, to load the file. Since we are calling a url, we need to use
the url() function as well.
Finally, we need to make sure the sna and
network packages are loaded, using library(),
so that R recognizes the PhilKidnapNet object as one of
class network.
# clear the workspace since we will recycle names
rm( list = ls() )
# load the libraries we need
library( sna )
library( network )
# define the path location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-philippine-kidnappings-net.rds"
PhilKidnapNet <- readRDS( url( loc ) )
# look at the network
PhilKidnapNet## Network attributes:
## vertices = 351
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = 246
## total edges= 402
## missing edges= 0
## non-missing edges= 402
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
gplot() function.# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PhilKidnapNet, # our network to plot
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
vertex.cex=2, # size the nodes
main="Kidnapping Network" # add a title
)# create a matrix from the network object
PhilKidnapMat <- as.matrix( PhilKidnapNet )
# identify the number of edges in the graph
L <- sum( PhilKidnapMat )
# identify the number of actors in the example
N <- dim( PhilKidnapMat )[1]
# identify the number of events in the example
M <- dim( PhilKidnapMat )[2]
# calculate the density
density.PhilKidnap <- L / ( N * M )
# check it out
density.PhilKidnap## [1] 0.0155633
The density of for the kidnapping network indicates that 2 percent of the ties that could be observed, were observed. We can see this in the plot in that there are a number of events that only involve a single person. Overall, we can say that the network is fairly sparse.
# raw scores for actors
actor.deg <- rowSums( PhilKidnapMat )
# raw scores for events
event.deg <- colSums( PhilKidnapMat )# an alternative is to just use the mean() function with the degree data
mean( actor.deg )
mean( event.deg )For the first set of nodes, the individuals who could attend events at locations, the mean degree is 1.63, meaning that on average, individuals participated in 1.63 kidnappings.
For the second set of nodes, the kidnapping incidents, the mean degree is 3.83, meaning that each incident, on average, had 3.83 individuals involved in the kidnapping.
# standardized score for actors
s.actor.deg <- actor.deg / M
# standardized score for events
s.event.deg <- event.deg / N## [1] 0.0155633
## [1] 0.0155633
Notice that these are the same because we adjusted for the different sizes of M and N in the network.
# define the standardized scores for actors
actor.size <- actor.deg / M
# define the standardized scores for events
event.size <- event.deg / N
# combine these to use in the plot
v.size <- c( actor.size, event.size )
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PhilKidnapNet, # our network to plot
gmode = "twomode", # indicate it is two modes
usearrows = FALSE, # turn off the arrowheads
main="Kidnapping Network", # add a title
vertex.cex = v.size + 0.5 # set the size (add 0.5 so it is not too small)
)We use the tnet package has the
reinforcement_tm() function to calculate the dyadic
clustering score. Recall that this package conflicts with
sna and network. So, we will need to make sure
we load those packages again below.
# load tnet
library( tnet )
# coerce the network object to work with tnet
kidnapping.tnet <- as.tnet( as.matrix( PhilKidnapNet ) )
# get the reinforcement score
reinforcement_tm( kidnapping.tnet )## [1] 0.122801
The value of 0.12 indicates that 12 percent of the 3-paths are closed (i.e. 4-cycles). This means there is very little overlap in multiple kidnapping events by several individuals.
N and M) is
most central in the network.# create the function
rescale <- function( nchar, low, high ){
min_d <- min( nchar )
max_d <- max( nchar )
rscl <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
rscl
}
# set the seed to reproduce the plot layout
set.seed( 605 )
# execute the plot
gplot(
PhilKidnapNet,
gmode = "twomode",
usearrows = FALSE,
main="Kidnapping Network",
vertex.cex = rescale( v.size, 0.5, 1.8 ),
label = PhilKidnapNet %v% "vertex.names", # add the individual names
label.cex = rescale( v.size, 0.3, 1.8 ), # pass the rescale function to the label.cex argument
label.pos = 5, # set the label position
edge.col = "grey80" # change the color of the edges
)From the plot, we can see that 103 is the most central actor indicating that this person attended the most meetings. And, we see that 35 is the most central event meaning that this meeting had the most attendees.
PaulRevereNet Network# clear the workspace
rm( list = ls() )
# detach the packages we used above
detach( package: tnet )
detach( package: igraph )
# load the libraries we need
library( sna )
library( network )
# define the path location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-paul-revere-net.rds"
PaulRevereNet <- readRDS( url( loc ) )
# look at the network
PaulRevereNet## Network attributes:
## vertices = 261
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = 254
## total edges= 334
## missing edges= 0
## non-missing edges= 334
##
## Vertex attribute names:
## names people.names place.names vertex.names
##
## No edge attributes
matrix from the network.
Use the as.sociomatrix() function in the
network package to do so.Let’s use the edge.rescale() function to help us here.
This function returns a weighted edgelist that can be used to aid with
plotting.
edge.rescale <- function( uniMat, low, high ){
diag( uniMat ) <- 0
min_w <- min( uniMat[uniMat != 0] )
max_w <- max( uniMat[uniMat != 0] )
rscl <- ( ( high-low ) * ( uniMat[uniMat != 0] - min_w ) ) / ( max_w - min_w ) + low
rscl
}
edge.shade <- function( uniMat ){
net.edges <- edge.rescale( uniMat, 0.01, 1 )
vec.to.color <- as.vector( abs( net.edges ) )
vec.to.color <- 1 - vec.to.color # subtract 1 to flip the grey function scale.
edge.cols <- grey( vec.to.color )
return( edge.cols )
}Now, plug it into the plot.
# set the plot regions to ease with visualization
par(
mfrow = c( 2, 2 ),
mar = c( 2, 1, 4, 1)
)
# set the seed to reproduce the plot
set.seed( 605 )
# plot the bipartite network
gplot( PaulRevereMat,
gmode="twomode",
usearrows=FALSE,
edge.col="grey60",
edge.lwd = 1.2,
vertex.col = c(
rep( "#34f772", dim( PaulRevereMat )[1] ),
rep( "#656ffc", dim( PaulRevereMat )[2] ) )
)
title( "Bipartite Matrix of Paul Revere Network", line = 1 )
# plot the person matrix
gplot( PaulRevereMatP,
gmode = "graph",
usearrows = FALSE,
edge.col = edge.shade( PaulRevereMatP ),
edge.lwd = edge.rescale( PaulRevereMatP, 0.0001, 10 ),
vertex.col = "#34f772"
)
title( "Unipartite Projection of\n Individuals (Person) Matrix", line = 1 )
# plot the group matrix
gplot( PaulRevereMatG,
gmode = "graph",
usearrows = FALSE,
edge.col = edge.shade( PaulRevereMatG ),
edge.lwd = edge.rescale( PaulRevereMatP, 0.0001, 10 ),
vertex.col = "#656ffc",
vertex.sides = 4
)
title( "Unipartite Projection of\n Locations (Group) Matrix", line = 1 )network from the
dichotomized “person” matrix. Use this object for steps 6-11.# use the degree() function in the sna package
deg <- degree(
PaulReverePnet,
gmode = "graph",
cmode = "degree"
)Recall that the closeness() function reports the
standardized closeness centrality by default. If we want the
unstandardized closeness, we can just divide the result returned by
closeness() by g-1.
# set the number of nodes in the network
g <- dim( as.matrix( PaulReverePnet ) )[1]
# raw closeness centrality
close <- closeness( PaulReverePnet, gmode="graph" ) / ( g - 1 )
# raw betweenness centrality
btwn <- betweenness(
PaulReverePnet,
gmode="graph"
)# create a data frame that shows the values for each individual
cent.dat <- data.frame(
degree = deg,
closeness = close,
betweenness = btwn
)# standardized score is deg / g-1 where g is the number of nodes
s.deg <- deg / ( g - 1 )
# standardized closeness is the score reported by default
s.close <- closeness( PaulReverePnet, gmode="graph" )
# standardized betweenness is the betweenness / ( ( ( g-1 ) * ( g-2 ) ) / 2 )
s.btwn <- btwn / ( ( ( g-1 ) * ( g-2 ) ) / 2 )# create a data frame that shows the values for each individual
s.cent.dat <- data.frame(
s.degree = s.deg,
s.closeness = s.close,
s.betweenness = s.btwn
)# use the mean() function to calculate the means
mean.deg <- mean( deg )
mean.close <- mean( close )
mean.btwn <- mean( btwn )
# create a table that is easy to read
cent.tab <- data.frame(
degree = round( mean.deg, 3 ),
close = round( mean.close, 3 ),
between = round( mean.btwn, 3 )
)
# print the table
cent.tabThe mean degree centrality score is 80.48 meaning that the average node has 80.48 edges incident on them. The mean closeness centrality score is 0.002. meaning that the average node is far from all other nodes in the network. Finally, the mean betweenness centrality score is 86.89, meaning that the average node sits on 86.89 paths between i and j.
dcent <- centralization( PaulReverePnet, degree, mode="graph", cmode="degree" )
ccent <- centralization( PaulReverePnet, closeness, mode="graph" )
bcent <- centralization( PaulReverePnet, betweenness, mode="graph" )
# create a table that is easy to read
centralization.tab <- data.frame(
deg = round( dcent, 2 ),
close = round( ccent, 2 ),
between = round( bcent, 2 )
)
# print the table
centralization.tabThe degree centralization score is 0.67 and the closeness centralization score is 0.77. Both are somewhat close to 1 indicating that the nodes differ substantially in terms of their scores. We can see that Paul Revere accounts for most of this due to his high degree of 248. Recall that there are only 254 individuals in the network. Paul Revere is connected to nearly 98 of them.
The betweenness centralization score is much closer to zero, at 0.11 meaning that there is very little variation between nodes in terms of their scores. Again, looking at the graph, you can see that there are few places where a single actor occupies a key bridging position.
PhilKidnapNet Network# clear the workspace
rm( list = ls() )
# load the libraries we need
library( sna )
library( network )
# define the path location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-philippine-kidnappings-net.rds"
PhilKidnapNet <- readRDS( url( loc ) )
# look at the network
PhilKidnapNet## Network attributes:
## vertices = 351
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = 246
## total edges= 402
## missing edges= 0
## non-missing edges= 402
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
matrix from the network.
Use the as.sociomatrix() function in the
network package to do so.Let’s use the edge.rescale() function to help us here.
This function returns a weighted edgelist that can be used to aid with
plotting.
edge.rescale <- function( uniMat, low, high ){
diag( uniMat ) <- 0
min_w <- min( uniMat[uniMat != 0] )
max_w <- max( uniMat[uniMat != 0] )
rscl <- ( ( high-low ) * ( uniMat[uniMat != 0] - min_w ) ) / ( max_w - min_w ) + low
rscl
}
edge.shade <- function( uniMat ){
net.edges <- edge.rescale( uniMat, 0.01, 1 )
vec.to.color <- as.vector( abs( net.edges ) )
vec.to.color <- 1 - vec.to.color # subtract 1 to flip the grey function scale.
edge.cols <- grey( vec.to.color )
return( edge.cols )
}Now, plug it into the plot.
# set the plot regions to ease with visualization
par(
mfrow = c( 2, 2 ),
mar = c( 2, 1, 4, 1)
)
# set the seed to reproduce the plot
set.seed( 605 )
# plot the bipartite network
gplot( PhilKidnapMat,
gmode="twomode",
usearrows=FALSE,
edge.col="grey60",
edge.lwd = 1.2,
vertex.col = c(
rep( "#34f772", dim( PhilKidnapMat )[1] ),
rep( "#656ffc", dim( PhilKidnapMat )[2] ) )
)
title( "Bipartite Matrix of Kidnapping Network", line = 1 )
# plot the person matrix
gplot( PhilKidnapMatP,
gmode = "graph",
usearrows = FALSE,
edge.col = edge.shade( PhilKidnapMatP ),
edge.lwd = edge.rescale( PhilKidnapMatP, 0.0001, 10 ),
vertex.col = "#34f772"
)
title( "Unipartite Projection of\n Individuals (Person) Matrix", line = 1 )
# plot the group matrix
gplot( PhilKidnapMatG,
gmode = "graph",
usearrows = FALSE,
edge.col = edge.shade( PhilKidnapMatG ),
edge.lwd = edge.rescale( PhilKidnapMatG, 0.0001, 10 ),
vertex.col = "#656ffc",
vertex.sides = 4
)
title( "Unipartite Projection of\n Meetings (Group) Matrix", line = 1 )network from the
dichotomized “person” matrix. Use this object for steps 6-11.# use the degree() function in the sna package
deg <- degree(
PhilKidnapPnet,
gmode = "graph",
cmode = "degree"
)Recall that the closeness() function reports the
standardized closeness centrality by default. If we want the
unstandardized closeness, we can just divide the result returned by
closeness() by g-1.
# set the number of nodes in the network
g <- dim( as.matrix( PhilKidnapPnet ) )[1]
# raw closeness centrality
close <- closeness( PhilKidnapPnet, gmode="graph" ) / ( g - 1 )
# raw betweenness centrality
btwn <- betweenness(
PhilKidnapPnet,
gmode="graph"
)# create a data frame that shows the values for each individual
cent.dat <- data.frame(
degree = deg,
closeness = close,
betweenness = btwn
)# standardized score is deg / g-1 where g is the number of nodes
s.deg <- deg / ( g - 1 )
# standardized closeness is the score reported by default
s.close <- closeness( PhilKidnapPnet, gmode="graph" )
# standardized betweenness is the betweenness / ( ( ( g-1 ) * ( g-2 ) ) / 2 )
s.btwn <- btwn / ( ( ( g-1 ) * ( g-2 ) ) / 2 )# create a data frame that shows the values for each individual
s.cent.dat <- data.frame(
s.degree = s.deg,
s.closeness = s.close,
s.betweenness = s.btwn
)# use the mean() function to calculate the means
mean.deg <- mean( deg )
mean.close <- mean( close )
mean.btwn <- mean( btwn )
# create a table that is easy to read
cent.tab <- data.frame(
degree = round( mean.deg, 3 ),
close = round( mean.close, 3 ),
between = round( mean.btwn, 3 )
)
# print the table
cent.tabThe mean degree centrality score is 20.902 meaning that the average node has 20.902 edges incident on them.
The mean closeness centrality score is 0. Actually, if you look at the scores you will see that all the nodes have a score of 0. Why? Because the network is disconnected. There are nodes with infinite distance, so this renders the closeness score to be zero.
Finally, the mean betweenness centrality score is 176.329, meaning that the average node sits on 176.329 paths between i and j.
dcent <- centralization( PhilKidnapPnet, degree, mode="graph", cmode="degree" )
ccent <- centralization( PhilKidnapPnet, closeness, mode="graph" )
bcent <- centralization( PhilKidnapPnet, betweenness, mode="graph" )
# create a table that is easy to read
centralization.tab <- data.frame(
deg = round( dcent, 2 ),
close = round( ccent, 2 ),
between = round( bcent, 2 )
)
# print the table
centralization.tabThe degree centralization score is 0.23 and the betweenness centralization score is 0.27. Both are somewhat close to 0 indicating that the nodes differ in their degree/betweenness centrality scores, but there is not substantial variation. Put differently, this network is more decentralized than it is hierarchical.
The closeness centralization score 0. But, this is because all of the scores for closeness are zero. As a result, there is no variation in the scores.